Worst-case complexity and empirical evaluation of artificial intelligence methods for unsupervised word sense disambiguation

نویسندگان

  • Didier Schwab
  • Jérôme Goulian
  • Andon Tchechmedjiev
چکیده

Word Sense Disambiguation (WSD) is a difficult problem for NLP. Algorithm that aim to solve the problem focus on the quality of the disambiguation alone and require considerable computational time. In this article we focus on the study of three unsupervised stochastic algorithms for WSD: a Genetic Algorithm (GA) and a Simulated Annealing algorithm (SA) from the state of the art and our own Ant Colony Algorithm (ACA). The comparison is made both in terms of the worst case computational complexity and of the empirical performance of the algorithms in terms of F1 scores, execution time and evaluation of the semantic relatedness measure. We find that the worst-case complexity of GA is a factor of 100 higher that SA. However, it is difficult to make any comparison to ACA. We estimate the best parameters manually for SA and GA, but automatically for ACA (made possible by its short execution time). We find that ACA leads to a shorter execution time (factor of 10 and 100, respectively) as well as better results. Using different voting strategies, we find a small increase in the F1 scores of SA and GA and significant improvements in the results of the ACA. With the latter, we surpass the First Sense baseline and come close to the results of supervised systems on the coarse-grained all words task from Semeval 2007. Copyright c © 2009 Inderscience Enterprises Ltd. 2 D. Schwab, J. Goulian and A. Tchechmedjiev

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

RACAI: Unsupervised WSD Experiments @ SemEval-2, Task 17

This paper documents the participation of the Research Institute for Artificial Intelligence of the Romanian Academy (RACAI) to the Task 17 – All-words Word Sense Disambiguation on a Specific Domain, of the SemEval-2 competition. We describe three unsupervised WSD systems that make extensive use of the Princeton WordNet (WN) structure and WordNet Domains in order to perform the disambiguation. ...

متن کامل

Kernel Fuzzy C-Means Clustering for Word Sense Disambiguation in

Word sense disambiguation (WSD) in biomedical texts is important. The majority of existing research primarily focuses on supervised learning methods and knowledge-based approaches. Implementing these methods requires significant human-annotated corpus, which is not easily obtained. In this paper, we developed an unsupervised system for WSD in biomedical texts. First, we predefine the number of ...

متن کامل

Kim, Su Nam and Timothy Baldwin (2007) Disambiguating Noun Compounds, In Proceedings of the Twenty-Second Conference on Artificial Intelligence (AAAI-07), Vancouver, Canada, pp. 901-6

This paper is concerned with the interaction between word sense disambiguation and the interpretation of noun compounds (NCs) in English. We develop techniques for disambiguating word sense specifically in NCs, and then investigate whether word sense information can aid in the semantic relation interpretation of NCs. To disambiguate word sense, we combine the one sense per collocation heuristic...

متن کامل

Co-training and Self-training for Word Sense Disambiguation

This paper investigates the application of cotraining and self-training to word sense disambiguation. Optimal and empirical parameter selection methods for co-training and self-training are investigated, with various degrees of error reduction. A new method that combines cotraining with majority voting is introduced, with the effect of smoothing the bootstrapping learning curves, and improving ...

متن کامل

Learning Probabilistic Models of Word Sense Disambiguation

This dissertation presents several new methods of supervised and unsupervised learning of word sense disambiguation models. The supervised methods focus on performing model searches through a space of probabilistic models, and the unsupervised methods rely on the use of Gibbs Sampling and the Expectation Maximization (EM) algorithm. In both the supervised and unsupervised case, the Naive Bayesi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Int. J. Web Eng. Technol.

دوره 8  شماره 

صفحات  -

تاریخ انتشار 2013